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WE CLAIM : 

1 1 . A method for identifying a novel nucleic acid molecule encoding a 



2 


protein of interest comprising: 


3 




scicciing a speciiic proiein irom a iirsi species invoiveo in a 


4 




regujdiory neiworK or inieresi, 


5 


00 


identifying known proteins that act upstream and 


6 




uownsLredm in me reguidiory neiworK or mieresi wun respect 


7 




to the specific protein selected; 


8 


(iii) 


constructing the regulatory network of interest from the 


9 




proteins identified in step (ii); 


10 


(iv) 


for each identified protein, select a domain or motif and 


11 




search by homology for related proteins in a second species, 


12 




wherein a related protein is defined as a protein having a 


13 




homologous domain or motif; 


14 


(V) 


producing a regulatory network for the second species, 


15 




wherein said regulatory network incorporates the identified 


16 




related proteins; 
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1 7 (vi) comparing the regulatory network from the first species to 

18 the regulatory network of said second species; 

1 9 (v) identifying a protein present in a regulatory network for one 

20 species but absent in the regulatory network of the other 

21 species; and 

22 (vi) isolating a nucleic acid molecule encoding the protein 

23 identified in step (v) in the species in which it is missing. 

1 2. The method of Claim 1 wherein the nucleic acid molecule encodes 

2 human protein. 

1 3. The method of claim 1 wherein the related proteins are orthologs. 
1 

2 4. The method of claim 1 wherein the regulatory pathway is involved in 

3 apoptosis. 

1 5. The method of claim 1 wherein the specific protein from the first 

2 species is involved in tumor suppression. 
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1 6. A method for identifying the affect of a gene knockout on a regulatory 

2 pathway comprising the following steps: 

3 (i) identification of the shortest non-oriented pathway 

4 connecting two gene products; 

5 (ii) assigning an initial sign value of"-" to the knockout since the 

6 knockout gene product is inactive; 

7 (iii) moving along the shortest pathway between the two gene 

8 products multiplying the sign with the sign of the next gene 

9 product in the pathway, wherein stands for inhibition, 

1 0 stands for induction or activation, and "0" stands for the lack 

1 1 of interaction between two proteins in the specified direction; 

12 and 

1 3 (iv) determining the final sign at the end of the pathway, wherein 

14 indicates inhibition and indicates induction or 

1 5 activation of the pathway. 

7. A method for identifying a novel nucleic acid molecule encoding a 
protein of interest comprising: 
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(i) selecting a gene of interest and searching a database for 
homologous sequences; 

(ii) aligning the homologous sequences identified in step (/); 

(iii) constructing a gene tree using the sequence alignment; 

(iv) constructing a species tree; 

(v) imputing the species tree and gene tree into an algorithm 
which integrates the species tree and the gene tree into a 
reconciled tree; and 

(vi) identifying orthologous genes present in one species but 
missing in another. 



1 8. The method of claim 7 wherein the following algorithm is used to 

2 integrate the species tree and the gene tree into a reconciled tree: 

3 (i) computing the similarity o{S .,S .) for each pair of interior 

4 nodes from trees and T„ 

5 (ii) fmdmg the maximum ^(S^.^S^p; 

6 (iii) saving S^j as a new cluster of orthologs, save {S^^j} - {S^^ as 

7 a set of species that are likely to have gene of this kind (or 

8 lost it in evolution); 
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9 (iv) eliminating S^.; from T^; T„: = T„\Sj.i; 

1 0 (v) repeating step (ii)-(iv) until Tj. is non-empty. 





11 


9. A method for identifying a novel gene comprising the following 




12 


steps: 






13 


(i) 


defining a motif or domain composition of a gene of interest; 


I J 

kJ 


14 


(ii) 


searching for sequences which correspond to nucleotide 


i r'i 


15 




sequences in an expression sequence tag database or other 


m 

fil 


16 




cDNA databases using a program such as BLAST and 




17 




retrieving the identified sequences; 


11 


18 


(iii) 


searching additional databases for expressed sequence tags 


t . 

r "~ 

Cli 


19 




containing the domains and motifs characteristic for 


20 
21 




the gene of interest with Hidden Markov Model of domains 
and motifs identified in step (i); 




22 


(iv) 


identifying nucleotide sequences comprising the gene of 




23 




interest. 



24 1 0. The method of claim 9 further comprising using each identified 

25 expression sequence tag to search sequence databases for 
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26 overlapping sequences for the purpose of assembling longer 

27 overlapping stretches of DNA. 
28 

29 ^ A^l^ethod for extracting information on interactions between 

30 biological entities from natural-language text data, comprising: 

3 1 (i) parsingVhe text data to determine the grammatical structure of the 

32 text data Wd 

33 (ii) regularizingahe parsed text data to form structured word terms. 

1 12. The method according to claim 1 1, further comprising preprocessing 

2 the data prior to parsing, with preprocessing comprising the step of identifying biological 
1 entities. 

1 13. The method according to claim 11, further comprising referring to an 

2 additional parameter which is indicative of the degree to which subphrase parsing is to be 
1 carried out. 

1 14. The method according to claim 1 1, wherein said parsing step further 

2 comprises segmenting the text data by sentences. 
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1 5. The method according to claim 11, wherein said parsing step further 

comprises: 

segmenting the text data by sentences; and 

segmenting each of the sentences at identified words or phrases. 

1 6. The method according to claim 1 1, wherein said parsing step further 

comprises: 

segmenting the text data by sentences; and 
segmenting each of the sentences at a prefix. 



1 7. The method according to claim 11, wherein said parsing step further 
comprises skipping undefined words. 



1 8. The method according to claim 11, wherein said parsing step further 

comprises: 

identifying one or more binary actions and their relationships; and 
identifying one or more arguments associated with the actions. 
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1 ^ Y/ ^ ^ ^\ according to claim 1 1 , further comprising performing 

2 error recovery when parsmg of the text data is unsuccessful. 

1 20. The method according to claim 19, wherein said error recovery step 

2 comprises: 

3 segmenting the text data; and 

4 analyzing the segmented text data to achieve at least a partial parsing of the 

5 unsuccessfully parsed text data. 

21 . TAhe method according to claim 1 1 , wherein said tagging step 

2 comprises providing theXstructured data component in a Standard Generalized Markup 

1 Language (SGML) compatible format. 

1 22. A computer system for extracting information on biological entities 

2 from natural-language text data, comprising: 

3 (i) means for parsing the natural-language text data; and 

4 (ii) means for regularizing the parsed text data to form structured word 

5 terms. 
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1 23. The system according to claim 22, further comprising means for 

2 preprocessing the data prior to parsing, with the preprocessing means comprising 

3 identifying biological entities. 



1 24. The system according to claim 22, further comprising means for 

2 referring to an additional parameter which is indicative of the degree to which subphrase 
1 parsing is to be carried out. 



iJ 



1 25. The system according to claim 22, wherein said parsing means 

2 further comprises means for segmenting the text data by sentences. 



1 26. The system according to claim 22, wherein said parsing means 

Q 2 further comprises: 

3 means for segmenting the text data by sentences; and 

4 means for segmenting each of the sentences at identified words or phrases. 

1 27. The system according to claim 22, wherein said parsing means 

2 further comprises: 

3 means for segmenting the text data by sentences; and 
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4 means for segmenting each of the sentences at a prefix. 

1 28. The system according to claim 22, wherein said parsing means 

2 further comprises means for skipping undefined words. 

1 29. The system according to claim 22, wherein said parsing means 

2 further compri ses : 

3 means for identifying one or more binary actions and their relationships; and 

4 means for identifying one or more arguments associated with the actions. 

1 30. The system according to claim 22, further comprising means for 

2 performing error recovery when parsing of the text data is unsuccessful. 

1 31. The system according to claim 22, wherein said error recovery 

2 means comprises: 

3 means for segmenting the text data; and 

4 means for analyzing the segmented text data to achieve at least a partial 

5 parsing of the unsuccessfully parsed text data. 
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32. The system according to claim 22, wherein said tagging means 
comprises means for providing the structured data component in a Standard Generalized 
Markup Language (SGML) compatible format. 
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